Significant Triples: Adjective+Noun+Verb Combinations

نویسندگان

  • HEIKE ZINSMEISTER
  • ULRICH HEID
چکیده

We investigate the identification and, to some extent, the classification of collocational word groups that consist of an adjectival modifier (A), an accusative object noun (N), and a verb (V) by means of parsing a newspaper corpus with a lexicalized probabilistic grammar.1 Data triples are extracted from the resulting Viterbi parses and subsequently evaluated by means of a statistical association measure. The extraction results are then compared to predefined descriptive classes of ANV-triples. We also use a decision tree algorithm to classify part of the data obtained, on the basis of a small set of manually classified examples. Much of the candidate data is lexicographically relevant: the triples include idiomatic combinations (e.g. (sich) eine goldene Nase verdienen, ‘get rich’, lit. ‘earn oneself a golden nose’), combinations of N+V and N+A collocations (e.g. (eine) klare Absage erteilen, ‘refuse resolutely’, lit. ‘give a clear refusal’), next to cases where N+V or N+A collocations are found, in combination with other (not necessarily collocational) context partners. To extract such data from text corpora, a grammar is needed that captures verb+object relations: simple pattern matching on part-of-speech shapes is not sufficient. Statistical tools then allow to order the data in a way useful for subsequent manual selection by lexicographers. 1This work has been carried out in the context of the Transferbereich 32: Automatische Exzerption, a DFG-funded project aiming at the creation of support tools for the corpus-based updating of printed dictionaries in lexicography, carried out in cooperation with the publishers Langenscheidt KG and Duden BIFAB AG.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Machine Learning Approach for Sentiment Analysis Based on Adverb-Adjective-Noun-Verb (AANV) Combinations

The capability to study facts(data) about each living as well as non-living entity and derive conclusions(information) from those facts and then store them for future use and reference(knowledge), is an art which no other species has been gifted. This skill has been enriched over the time. With the advent of the internet, communicating across the globe has virtually been reduced to our palm. So...

متن کامل

Lexical and Grammatical Collocations in Writing Production of EFL Learners

Lewis (1993) recognized significance of word combinations including collocations by presenting lexical approach. Because of the crucial role of collocation in vocabulary acquisition, this research set out to evaluate the rate of collocations in Iranian EFL learners' writing production across L1 and L2. In addition, L1 interference with L2 collocational use in the learner' writing samples was st...

متن کامل

Methods for the Qualitative Evaluation of Lexical Association Measures

This paper presents methods for a qualitative, unbiased comparison of lexical association measures and the results we have obtained for adjective-noun pairs and preposition-noun-verb triples extracted from German corpora. In our approach, we compare the entire list of candidates, sorted according to the particular measures, to a reference set of manually identified “true positives”. We also sho...

متن کامل

Speeded recognition of ungrammaticality: Double violations*

A model of sentence comprehension postulating that Subject-Verb-Object relations are specified prior to Noun-Adjective relations received support from a study of the speed at which sentences with various kinds of violations could be rejected. Compatible with the sequential model was the finding that Noun-Verb and Adjective-Noun double violations did not result in shorter RTs than Noun-Verb sing...

متن کامل

Determinants of Adjective-Noun Plausibility

This paper explores the determinants of adjective-noun plausibility by using correlation analysis to compare judgements elicited from human subjects with five corpus-based variables: co-occurrence frequency of the adjective-noun pair, noun frequency, conditional probability of the noun given the adjective, the log-likelihood ratio, and Resnik’s (1993) selectional association measure. The highes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003